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Mella is a minimalistic dependently typed programming language 
and interactive theorem prover implemented in Haskell. Its main purpose 
is to investigate the effective integration of automated theorem provers in 
a pure and simple setting. Such integrations are essential for supporting 
program development in dependently typed languages. We integrate the 
equational theorem prover Waldmeister and test it on more than 800 
proof goals from the TPTP library. In contrast to previous approaches, 
the reconstruction of Waldmeister proofs within Mella is quite robust 
and does not generate a significant overhead to proof search. Mella thus 
yields a template for integrating more expressive theorem provers in more 
sophisticated languages. 
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£2 ' 1 Introduction 

Dependently typed programming (DTP) languages such as Adga [TD] or Epi- 
gram |18) are currently receiving considerable attention. By combining the 
elegance of functional programming with more expressive type systems, they 
introduce a new mathematically principled style of program development. In 
contrast to traditional functional programming, types are powerful enough to 
support detailed specifications of a program's properties. This however requires 
■ type-level reasoning that is no longer decidable. DTP languages are at the 

same time interactive theorem proving (ITP) systems similar to Nuprl [14] or 
Coq 8 . On the one hand this supports developing programs that are correct by 
construction. On the other hand it puts an additional burden on programmers. 

To support program development at an appropriate level of abstraction, it 
is essential that programmers can focus on the more high-level creative aspects 
of proofs, whereas trivial and routine proof tasks are automated. Yet how can 
this be achieved? 

Traditionally, automation is obtained in ITP systems by implementing large 
libraries of tactics, internally verified solvers and sophisticated simplification 
techniques, or by using external solvers as oracles. More recently, external au- 
tomated theorem proving (ATP) systems, satisfiability modulo theories (SMT) 
solvers and other decision procedures have been integrated in a more trustwor- 
thy way into ITP systems by internally reconstructing proofs provided by the 
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external tools. A prime example is Isabelle's Sledgehammer tool (cf. [3]), which 
includes a relevance filter for selecting hypotheses, an interface for passing proof 
tasks to external tools, and a mechanism for internally reconstructing external 
proofs. 

This approach seems particularly promising for DTP languages where it 
could make program development more lightweight and less time consuming. 
Unfortunately however, ATP integration for DTP languages is not straightfor- 
ward. First, state-of-the-art ATP technology is designed for classical reasoning 
whereas DTP requires constructive logic. Second, the logical kernels of DTP 
languages tend to be much more complex than those of traditional ITP systems, 
hence proof reconstruction establishes relatively less trust. Third, proof recon- 
struction turns out to be highly inefficient in practice due to proof normalisation, 
whereas in theory it should be linear in the size of input proofs [T3] . 

Due to these issues, ATP integrations for DTP languages certainly deserve 
to be studied in a radically pure and simple setting. This essentially amounts 
to building a simple trustworthy DTP language kernel around an ATP system 
as its most important proof engine. In this paper we focus in particular on the 
communication between ATP and ITP and the efficiency of proof reconstruction. 
Our main contributions are as follows. 

First, we implement the extended calculus of constructions with universes 
as a minimalistic DTP language, called Mella, in Haskell. This includes term 
data types based on de Bruijn indices and a monadic approach to bidirectional 
type checking and inference. 

Second, we design and implement a simple proof scripting language for 
Mella. It is inspired by Isabelle/Isar and Agda. Apart from commands for 
executing interactive proofs it allows calling external ATP systems within the 
Proof General interface [2] . 

Third, we provide interfaces for executing Mella proofs in the ATP system 
Waldmeister and for reconstructing Waldmeister proofs within Mella. Proof 
reconstruction amounts to building a Mella proof term and type checking it; 
proof normalisation is avoided. 

Fourth, we test the performance of the ATP integration on more than 800 
proof tasks from the TPTP library [26 . In contrast to previous approaches, 
proof reconstruction is very effective and does not create a significant overhead 
to Waldmeister proof search. However, a small number of proof reconstructions 
currently fail due to dynamic scoping problems. 

In many ways, Mella is still a prototype. The DTP language implemented 
has neither recursion nor data types. It is just expressive enough to support 
proofs in many-sorted first-order constructive logic with equality. But for the 
main purpose of this paper — exploring effective ATP integrations for DTP lang- 
uages — this is certainly no limitation. 

Two particular features of Mella proof reconstruction are that proof search 
and proof normalisation are avoided. Our micro-step reconstruction is in con- 
trast to Isabelle's current macro-step approach based on the internally verified 
ATP system Metis [TB], and it seems more robust and efficient. In contrast 
to Agda or Coq, we only type check the internal proof terms corresponding 
to external proofs. These proof terms provide proof certificates that could be 
further normalised if needed. Whenever type checking succeeds, correctness of 
Mella's inference system guarantees that normalisation is possible. 
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2 Calculus of Constructions 



This section introduces the basics of the calculus of construction, which is 
Mella's underlying type theory. We assume familiarity with basic type sys- 
tems [231 H2] ■ The type inference rules of this calculus are given in Figure [T] 
its details are explained in the remainder of this section, 
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Figure 1: Typing rules for CCoj 



The set of terms of the calculus of constructions (CC) is inductively defined 
by the following grammar. 

t ::= x e V | Ux : t A \ \x : t A \ tt 

Here, V is a set of variables. IIx : t . t is the dependant product type; it essen- 
tially amounts to universal quantification. Xx : t . t is lambda abstraction and tt 
is application. In CC, types themselves are terms. They are distinguished, and 
their mutual dependencies are expressed, by the type inference rules. Terms 
that are not types are called nonAype terms, or briefly terms if the context 
allows. A type of a non-type term is called proper, whereas types of types are 
called sorts. 
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Judgements are expressions r h t : T, where T is an environment that 
provides types for variables, t is a term and T a type. They can be proved by 
the type inference rules. 

In CC, every proper type has sort *, while * is defined to have sort □. 
The dependencies between terms and types for CC can be modelled by the 
set {(*,*),(*,□),(□,*),(□,□)}. The statement (*,*), for instance, says that 
terms may depend on terms; the statement (□,*) says that terms may depend 
on types. 

To make our calculus rich enough for DTP we extend it with universes. The 
calculus of constructions with universes, CCw, extends CC with an infinite set 
□o, ...,□„,.. . of sorts [19] . A pure type system (PTS) is given by a triple 
(S,A,1Z), where S is a set of sorts and A a set of typing relations s\ : S2 
with si,S2 G S. The set 1Z consists of triples (si, 83,83), where 81,82,33 G 1Z. 
This set, in combination with the typing rule T-Pi in Figure [T] controls the 
dependencies of terms and types. The PTS for CClu [7] is given by: 

S = {*}U{Di I i G N}, 
^={*:n }U{D i :D l+ i i G N}, 

K={*^*,*^D 11 D ! ^*| l eN}U {(Pi, U S , □ max (i,i)) I * G N}, 

The notation s% S2 is shorthand for (si, S2, S2)- * * means that terms can 
depend on terms, * □,; means that types can depend on terms (dependent 
types) and * means that terms can depend on types. The set of all triples 

(□j, Uj, □ max (i J )) defines how types are allowed to depend on types. If a type 
□i depends on another type Oj, it must be at the same level in the hierarchy 
as the highest type □ ma x(i,j) it depends on. 

The syntax of CCw terms is defined by the following grammar, which extends 
and refines that for CC: 

t ::= s G S I n G N | x £ V \ Xt \ lit .t \ tt \ t :: t . 

We now use de Bruijn indices to represent variables introduced via A or IT 
binders, whereas top-level declarations are named Named variables are 

written in typewriter font; they are elements of V, the set of valid identifiers. 
Both named and unnamed variables can have free or bound occurrences. For 
example, in the term A 3, the index 3 is free because it is pointing outside the 
term. A term without free de Bruijn indices is called locally closed. 

The dependent product type IT A . B corresponds to the logical statement 
Va G A. B(a). To prove Va G A . B(a) constructively, one needs to show that 
for every possible a G A an inhabitant of B can be constructed. A function 
of type HA.B is therefore a proof of the statement Va G A. B(a). If B does 
not depend on A, then the dependent product type is A — > B. The additional 
syntax t :: t is type annotation. It allows us to explicitly state that a given term 
has some type. 

Because there are two kinds of variables — named and unnamed ones — judge- 
ments take the form r; A h t : T, where V and A are the typing contexts for 
named and unnamed variables. The syntax for these contexts is 

T ::= I T,x : T, A ::= | A, T. 

Both contexts are lists, but since T names must be unique in T, it can be treated 
as a set. We use to represent empty contexts. We often omit empty contexts 
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and write h t : T rather than 0; h t : T. We write T, x : T to denote that 
context r is extended with the new binding x : T, whereas for A, only a type 
is supplied. We write x : T 6 T to assert that T is the type of x in F. We write 
[k i-> t']t for the substitution of term t' for the index k in the term t. 

Unlike variables in T, those in A are nameless and cannot be looked up by 
name. Instead we define two lookup operators ! and !! with and without index 
shifting to retrieve the types of variables from A. 

AT"n — i T ifn = 0, . , _*«+i(A|| \ 

~ n - \ A!!(n- 1) otherwise, A.n | {IX. .n). 

f~ t is the d-place shift of a term t above cutoff c [53] . We write t d t if the cutoff 
is zero. An unnamed context is well-formed only if all the terms within it are 
either proper types or sorts. Unnamed contexts can be concatenated using the 
4f operator. 

To implement this calculus, a bidirectional type checker is used. This means 
that for any term t of type T, one can either infer the type, written t :-(- T, or 
check that the term has the type, written t :j_ T. Type inference requires t and 
returns T, whereas type checking requires both t and T. The two rules T-lNF 
and T-Ann (the rule for type annotations) provide a conversion between type 
checking and type inference. Bidirectional type checking ensures that the rules 
are directly implementable without need for further transformation. 



3 An Extended Calculus 

Mella requires additional features for equational and incremental interactive 
reasoning: identity types and metavariables. We call this extension CC+. 

Firstly, we add identity types to CCuj. The identity type IdA (a, b) for any 
type A, where a, b : A, denotes that a and b represent identical proofs of propo- 
sition A [50]. This captures propositional equality within Mella and supports 
equational reasoning. Our identity type corresponds to the implementation of 
propositional equality as an inductive family in Agda. Several new terms need 
to be added to the grammar of CCuj: 

t::= Xx.t\ ... | refl | Id t (t,t) | elimJ . 

The reflexivity term refl works exactly like refl in Agda |21) . It allows the 
construction of identity types I<1a(ci, b) where a =p b. The typing rules for the 
reflexivity term Eq-Refl and the identity term Eq-Id are as follows [3J: 

r ; AM: t s r ; Aha,&:iA a= b 

Eq-Refl r;Ahrefl :ildA{a , b) e s 

T; A h A : t s T: A h a, 6 :, A 

Eq ' Id — r aT~Fh r M s e S 

I ; A h ldA(a, a) :-(• s 

The J rule below eliminates identity types [T5], which corresponds to the 
term elimJ. It can be used in combination with refl to define the standard func- 
tions of equational logic in Mella, namely, substitutivity, congruence, transi- 
tivity and symmetry. Because displaying the J rule with the locally nameless 
syntax discussed in Section [5J would render it almost unreadable, we present its 
Mella syntax: 
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theorem elimJ : "(A : *) (C : (x y : A) -> Id A x y -> *) 
-> (e : (x : A) -> C x x refl) 
-> (x y : A) (P : Id A x y) -> C x y P". 
"\A C e x y P -> e x" . 
qed . 

Like Isabelle's proof scripting language Isar, Mella uses two levels of syntax. 
The inner syntax, surrounded by quotation marks, is used for CC+ terms. The 
outer syntax is for proof scripting. For user interaction, the locally nameless 
term representation is extended to a more readable named representation. The 
inner syntax is essentially a simplification of Agda's syntax for terms without 
implicit arguments or mixfix operators. 

Secondly, metavariables are used in Mella. Just as in Agda, these represent 
"holes" within terms that can incrementally be filled in — or refined — during 
proofs. Metavariables require one final language extension: 

t ::= Xx.t | ... | ? . 
As an example, consider checking that term A ? has type II * . II . 1. 
T- Axiom 



r ; At-*: t Di T;A,*h ? nO.l 

f;Ah A? :;LI* .110.1 

When we try to check that ? :j, 110 . 1, the type checker cannot proceed, so it 
stores a continuation which allows type checking to resume once a term for the 
metavariable has been supplied. This forms the basis for interactive theorem 
proving in Mella. 



4 Automated Theorem Proving Technology 

Having outlined the type-theoretic foundations of Mella, we now discuss the 
ATP technology which serves as its proof engine. 

ATP systems have been designed and implemented for many decades, but 
mainly for classical first-order logic with equations. They provide fully auto- 
mated proof search based on sophisticated term orderings, rewriting techniques 
and heuristics. They can often prove mathematical statements of moderate dif- 
ficulty and deal with large hypothesis sets, which makes them ideally suited for 
discharging "trivial" first-order proof goals in ITP systems. A prime example 
of an ATP integration is Isabelle's Sledgehammer tool (cf. [9] for an overview), 
which calls a number of external ATP systems and SMT solvers. A relevance 
filter selects hypotheses for the proof, and the external proof output is internally 
reconstructed to increase trustworthiness. Proof reconstruction is based on the 
Metis tool [TB], an Isabelle- verified automated theorem prover, which replays 
the external proof search with the hypotheses used by the external provers. 

An integration of ATP systems into DTP languages is, however, much less 
straightforward, as discussed in the introduction. We therefore start with the 
simplest case — pure equational logic — for which classical and constructive rea- 
soning coincide. We integrate the Waldmeister system [TS], which is highly 
effective for this fragment and supports sorttQ. 

x We are using the last publicly available version of Waldmeister, released in 1999. 
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Waldmeister accepts a set of equations as hypotheses and a single equation 
as a conclusion. It also requires a term ordering to use rewriting techniques for 
enhanced proof search. Technically, Waldmeister is based on the unfailing com- 
pletion procedure [I] , a variant of Knuth-Bendix completion [T7] that attempts 
to construct a (ground) canonical term rewrite system from the equational hy- 
potheses. This construction need not be finite, but it is guaranteed that a 
(rewrite) proof of a valid goal can be found in finite time. Apart from efficient 
proof search, Waldmeister offers two additional features that benefit an inte- 
gration into Mella. First, it provides extremely detailed proof output, down 
to the level of positions and substitutions for rewrites in terms. In contrast to 
Sledgehammer's macro-step proof reconstruction that replays proof search, we 
can therefore check individual proof steps efficiently and without search. Sec- 
ond, Waldmeister extracts lemmas from proofs. This memoisation of subproofs 
further enhances proof reconstruction. 

These features can be demonstrated in a simple example from group theory. 
Let (G, o, _1 ,l) be a group with carrier G, multiplication o, inversion _1 and 
unit 1. It satisfies axioms of associativity, right identity and right inverse 

x o (y o z) — (x o y) o z, x o 1 = x, x o x" 1 = 1. 

Assume that we have implemented groups in Mella and want to prove that 
every right identity is also a left identity: i^oi = ioi _1 . We then need to 
pass the axioms and the proof goal to Waldmeister and let it search for a proof. 
Figure [2] shows the Waldmeister input file that corresponds to this proof task. 

NAME group 
MODE PROOF 
SORTS 

ANY 
SIGNATURE 

e: -> ANY 

i: ANY -> ANY 

f: ANY ANY -> ANY 

a: -> ANY 
ORDERING 
LPO 

i > f > e > a 
VARIABLES 

x.y.z : ANY 
EQUATIONS 

f(x,e) = x 

f(x,i(x)) = e 

f(f(x,y),z) = f(x,f(y,z)) 
CONCLUSION 

f(a,i(a)) = f (i(a),a) 



Figure 2: Waldmeister group input file 

The group signature is declared in prefix notation, using sort ANY, and func- 
tions f : ANY ANY -> ANY, i : ANY -> ANY and e: -> ANY for multiplication, 
inverse, and unit. An constant a is also introduced. Waldmeister's term order- 
ing is declared in the ORDERING block: a lexicographic path ordering (lpo) is 
constructed from a precedence on the group signature and the constant a. The 
next block declares three variables x, y and z of type ANY. The EQUATIONS block 



7 



lists the group axioms in Waldmeister syntax. Finally, the proof goal is declared 
in Waldmeister syntax for constant a, since universal goals are Skolemised. 

After Waldmeister is called, it returns the proof in Figure [3] within millisec- 
onds. Here, the — details flag has been set to obtain precise information for 

Lemma 1: f (e , i (i (xl) ) ) = xl 

f (e,i(i(xl))) 
= by Axiom 2 RL at 1 with {xl <- xl} 
f (f (xl,i(xl)),i(i(xl))) 

by Axiom 3 LR at e with {x3 <- i(i(xl)), x2 <- i(xl), xl <- xl} 
f (xl,f(i(xl),i(i(xl)))) 

by Axiom 2 LR at 2 with {xl <- i(xl)} 
f (xl,e) 

= by Axiom 1 LR at e with {xl <- xl} 
xl 

Lemma 2 : ... 
Lemma 3 : ... 
Lemma 4 : ... 

Theorem 1: f(a,i(a)) = f(i(a),a) 
f(a,i(a)) 

= by Axiom 2 LR at e with {xl <- a} 
e 

= by Axiom 2 RL at e with {xl <- i(a)} 

f (i(a).i(i(a))) 
= by Lemma 4 LR at 2 with {xl <- a} 

f (i(a),a) 

Figure 3: Waldmeister group output file 

each proof step. In the third step of the proof of Lemma 1, 

f (xl,f (i(xl) ,i(i(xl)))) = f(xl,e) 

for instance, the right identity axiom f (xl , i (xl) ) = e has been used to rewrite 
from left to right the subterm at position 2 by matching or substituting i(xl) 
for xl. This level of detail allows efficient micro-step proof reconstruction; the 
lemmas generated support proof reconstruction by memoisation. Details of the 
communication between Mella and Waldmeister, in particular proof recon- 
struction, are covered in the following section. 

5 Implementing CC+ Terms in Haskell 

Users interact with Mella via the Proof General Emacs interface, which is 
standard for many ITP systems [2]. User level terms with explicit variables are 
parsed to an internal Haskell representation using de Bruijn indices, as repre- 
sented by the Haskell data type Index. The complete Haskell implementation 
can be found online^. The data type has a field dblnt for the index and another 
one, dbName, for the user level variable name. This is useful for pretty-printing. 

^http : //www. des . shef . ac . uk/~ alasdair 
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data Index = DB {dblnt :: Int, dbName :: Text} deriving (Show) 

instance Eq Index where 

(DB n _) == (DB m _) = n == m 

Next we provide data types for sorts and terms, 
data Sort = Star I Box Word deriving (Show, Eq) 

data Term = Sort Sort 

I Unnamed Index 
I Named Text 
I Pi Tag Term Term 
I Ann Term Term 
I App Term Term 

I J Term Term Term Term Term Term 

I Id Term Term Term 

I Lam Tag Term 

I Refl 

I Met a Int 

deriving (Eq, Show) 

Tags are used to attach additional information to terms. Specifically, for Lam 
and Pi terms, they store the associated user level variables, for instance to 
provide meaningful error messages. Tags are not relevant for term equality. 

Terms can be /3-reduced using the nf function. The function for shifting is 
implemented as follows: 

shift : : Int -> Int -> Term -> Term 
shift d c (Unnamed (DB n name)) 

I n < c = Unnamed (DB n name) 

I n >= c = Unnamed (DB (n + d) name) 
shift d c (Lam tag f) = Lam tag (shift d (c + 1) f) 
shift d c (Pi tag s t) = Pi tag (shift d c s) (shift d (c + 1) t) 
shift d c (App fx) = App (shift d c f) (shift d c x) 
shift d c (Ann t ty) = Ann (shift d c t) (shift d c ty) 
shift d c (Id ty a b) = Id (shift d c ty) (shift d c a) (shift deb) 
shift d c x = x 

Two additional Haskell functions process metavariables. A first function 
generates fresh metavariables as they arise in interactive proofs. A second func- 
tion substitutes user supplied expressions for metavariables. Detailed code can 
be found at our web site. 

The contexts T and A for named and unnamed variables are implemented 
as follows: 

data Ctx = Ctx { unnamed : : [(Tag, Term)] 

, named : : OMap Text (Term, Term) 
} 

emptyCtx : : Ctx 

emptyCtx = Ctx [] OMap . empty 

Since the order in which variables are added to named contexts may matter, a 
custom map data type, OMap, has been implemented to record that information. 

Finally, the set 1Z which defines the dependencies allowed between types and 
terms is implemented as follows: 
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setR Star Star = Star 

setR (Box n) Star = Star 

setR Star (Box n) = Box n 

setR (Box n) (Box m) = Box (max n m) 



6 Type Checking and Inference in Haskell 

Type checking is performed within a type checking monad. The overall approach 
is inspired by that of Agda. The type checking monad transformer (TCMT) is 
a monad transformer stack consisting of the EitherT monad transformer and 
the StateT transformer. The state monad carries the type checking context 
(tcmCtx) as well as lists of inference rules used for type checking (tcmTCRules) 
and type inference (tcmlRules). It also contains a depth value (tcmDepth) for 
tracing and logging the type checking process (tcmLog). Whenever a metavari- 
able is encountered during type checking, a continuation is added to the state 
(in tcmMetas). It contains the information required to resume type checking 
once a user supplies a value for it. As mentioned in Section [5l metavariables 
must be fresh, so a counter is used for indexing them. The cither monad allows 
handling failures; when type checking fails we use it to return TypeError values. 

newtype TCMT m a = TCMT 

{ unTCMT : : EitherT TypeError (StateT (TCMState m) m) a 
} deriving (Functor, Applicative, Monad, MonadIO) 



data TCMState m = TCMState { tcmDepth 

, tcmCtx 

, tcmTCRules 

, tcmlRules 

, tcmMetas 

, tcmLog 

, tcmCounter 



data MetaContinuation = MC Ctx Int Term 



Int 

Ctx 

[TCRule m] 
[IRule m] 

[MetaContinuation] 

[LogEntry] 

Counter 



Type checking rules have the form Term -> Term -> TCMT m Bool, where, 
as mentioned in Section [5J both the term and its tentative type are provided 
as inputs. Type checking returns True if the terms type check, and False if 
rule application fails (in which case another rule will be selected). It fails with 
TypeError if a term does not type check. Type inference rules require only a 
term t as an input. They return Just T when t :^ T, and Nothing when the 
inference rule cannot be applied; TypeError is raised when the rule fails. 

data TCRule m = TCR {. ruleName : : Text 

, rule : : Term -> Term -> TCMT m Bool 
} 

data IRule m = IR { inf erRuleName : : Text 

, inferRule : : Term -> TCMT m (Maybe Term) 
} 

Two Haskell functions are used for type checking and type inference with 
the TCMT monad. The typecheck function takes two terms as arguments and 
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attempts to apply a type checking inference rule. It returns Nothing if no 
inference rule can be found, and the name of the rule otherwise. The hasType 
function is similar, but simply fails if no rule can be applied. It is usually called 
as an infix function, and allows rules to be written in a more declarative fashion. 

typecheck : : (Functor m, Monad m) => Term -> Term -> TCMT m (Maybe Text) 
typecheck tl t2 = do 

tRules <- tcmTCRules <$> get 
foldM tryRule Nothing tRules 
where 

tryRule (Just name) _ = return (Just name) 
tryRule Nothing (TCR name rule) = do 
r <- rule tl t2 

return $ if r then Just name else Nothing 

The infer function attempts to infer the type of its argument. It fails if no 
inference rule can be applied and returns the inferred term otherwise. 

infer : : (Functor m, Monad m) => Term -> TCMT m Term 
infer t I inf t = do 

iRules <- tcmlRules <$> get 

r <- foldM tryRule Nothing iRules 

case r of 

(Just t) -> return t 

Nothing -> __ERR0R__ "infer" [("t", t)] 

"no rule could be applied to infer the type of\n{t}-" 
where tryRule (Just t) = return (Just t) 

tryRule Nothing (IR name rule) = rule t 

As an example of a type checking rule, the code for the T-Abs rule from 
Figure [T] is shown below. Pattern matching and guards are used to restrict 
the terms it can be applied to. Each line in the do block then imposes such 
a condition. validType checks that argType is either a proper type or a sort, 
while the next line checks that the body of the lambda expression has the correct 
type. If both these conditions hold, the rule can be applied and True is returned. 

tAbs : : (Functor m, Monad m) => Term -> Term -> TCMT m Bool 
tAbs (Lam tag expr) pi@(Pi _ argType exprType) I inf pi = do 
validType argType 

withUnnamedVar tag argType $ expr 'hasType' exprType 
return True 

tAbs = return False 

tAbsRule : : (Functor m, Monad m) => TCRule m 
tAbsRule = TCR "T-Abs" tAbs 

This Haskell infrastructure suffices to implement the CC+part of Mella. 
The ATP integration is described in the next section. 

7 ATP Integration 

Our general approach to ATP integration is depicted in Figure |3J Mella proof 
tasks are represented as judgements T; A h ? : T. They encode that from a set 
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of hypotheses given by the contexts T and A a proof term t — represented by 
metavariable ? — of type T (the proof goal) is to be inferred. This is achieved 
by serialising T, A and T and passing them on to Waldmeister. In our group 
example, T and A contain the group axioms, whereas T contains the proof goal. 
More generally, the contexts can also contain lemmas that have been proved 
before. If Waldmeister fails to find a proof within a certain time limit, the user is 
notified. Otherwise, its proof output is translated into a proof term t in Mella, 
which is then type checked. Since Waldmeister produces intermediate lemmas, 
as we have seen, an additional context T' is added to T. Constructing a proof 
term from a Waldmeister proof and type checking it yields proof reconstruction. 
We now discuss the individual steps in more detail. 

A Mella file consists of a list of commands delimited by periods, each of 
which can be processed and undone individually by Proof General. There are 
about 20 commands available to the user, which can be displayed using the 
commands command. The help command provides a documentation for every 
command in the system. The command fun introduces a new top-level function 
or value. The following commands, for instance, introduce an identity function 
and a constant function in Mella. 

fun id : "(A : *) -> A -> A" 
"\_ x -> x". 

fun const : " (A B : *) -> A -> B -> A" 
"\_ x _ -> x" . 

To declare a theorem and start a proof, the theorem command is used. It 
takes the name of the theorem and its type T. To prove the theorem, the user 
must construct a proof term t such that t :j. T. Proofs arc built up incrementally 
from commands and terms that may themselves contain metavariables. 
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As an example, assume we want to prove that 

f{x,g(y,g(x,z))) = x and g(x,f(y,f(x,z)))=x 

imply 

f(x,g{y,x)) = x. 

A "manual" Mella proof without using Waldmeistcr is as follows: 

theorem example : " (A : *) (f g : A -> A -> A) 

-> (axioml : (x y z : A) -> Id A (f x (g y (g x z) ) ) x) 
-> (axiom2 : (x y z : A) -> Id A (g x (f y (f x z) ) ) x) 
-> (x y tl t2 : A) -> Id A (fx (g y x) ) x" . 
intro A f g axl ax2 x y tl t2. 

= "f x (g y (g x (f tl (f x t2))))" by "ax2 x tl t2" at '2,2RL'. 

= "x" by "axl x y (f tl (f x t2))". 

refl. 

qed. 

normalize proof . 
describe proof. 

Mella commands can be terms, which are surrounded by quotation marks, the- 
orem definitions, function definitions, or command expressions. The command 
intro args generates a term of the form A args — > ?. The command 

= "f x (g y (g x (f tl (f x t2))))" by "ax2 x tl t2" at >2,2RL>. 

says that the left-hand term in the proof goal is equal to the term provided 
by applying the second axiom at position 2,2 to variable x from right to left, 
in a notation similar to Waldmeistcr. The second proof step is similar. The 
remaining step is rcflcxivity of equality. The commands normalize proof and 
describe proof normalise the proof and print out the proof term (which we 
do not show). We can also use the agda command to compile Mella files into 
Agda files. This is very useful for testing the correctness of our implementation. 

Command expressions form a large part of Mella's syntax. Examples are 
intro, = and qed as displayed above. Commands consist of a command name, 
followed by zero or more arguments and a list of keywords. Each keyword can 
again be associated with a list of arguments: 

command arg\...arg n :keywordl karg\...karg n :keyword2 ... 

Alternatively to the above manual proof we can use Waldmeister to prove 
our goal. 

theorem example : 

intro A f g axl ax2 x y. 

waldmeister : signature f g x y : axioms axl ax2 :kbo : timeout 2. 
qed. 

The waldmeister command is now used to instantiate the metavariable opened 
by the intro command. Waldmeistcr is given the functions and values it may 
use in the proof via the : signature keyword, which maps to the SIGNATURE 
section of the Waldmeister input file. The axioms to be used when constructing 
the proof are listed after the : axioms keyword, and are used in the EQUATIONS 
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section of the Waldmeister input file. The :kbo option tells Waldmeister to 
use a Knuth-Bendix ordering as the syntactic ordering for terms (based on 
the precedence given by the order of expressions declared after : signature). 
Finally, the : timeout keyword lets one specify the amount of time Waldmeister 
will be given for proof search. 

We now describe proof reconstruction. As already mentioned, Waldmeister 
splits proofs into lemmas. While this process is primarily intended to increase 
readability, it also enhances proof reconstruction by memoising subproofs. 

The Waldmeister output for the example proof above is shown below. Wald- 
meister renames variables in its output, so during reconstruction, the renamed 
variables must be matched with the correct variables within Mella. The 
proof shows that the term s5(sl,s4(s0,sl) is equal to si. It consists of two 
steps: First, Waldmeister applies Axiom 2 from right to left at position 2 . 2 in 
s5(sl , s4(s0 , si) , which results in the term shown on the next line. Secondly, 
Waldmeister uses Axiom 1 to reduce the term down to si, proving the goal. 

Theorem 1: s5(sl , s4(s0, si) ) = si 

s5(sl,s4(s0,sl)) 

by Axiom 2 RL at 2.2 with {x3 <- y, x2 <- z, xl <- si} 
S5(sl,s4(s0 ) s4(sl ) s5(z,s5(sl,y))))) 

by Axiom 1 LR at e with {x3 <- s5(z,s5(sl,y)) , x2 <- sO, xl <- si} 

si 

To prove a goal x — y, each step of a proof applies a lemma or axiom 
to a subterm of x. In Mella this requires us to use the inference rules for 
congruence (to select the subterm) and symmetry (to choose the direction). 
If neither congruence nor symmetry is required for a step, they are omitted 
from the proof output, as is the case for the second step above. The above 
Waldmeister proof has two steps, hence we need to use transitivity to join both 
steps together, resulting in the final reconstructed Mella proof term below. 
This proof term is somewhat unreadable; it has been indented to make the 
structure of the proof clearer. 

trans A (fx (g y x)) (fx (g y (g x (f y (f x y))))) x 

(cong A A x (g x (f y (f x y))) (\rc-cong-var -> f x (g y rc-cong-var) ) 
(sym A (g x (f y (f x y))) x 
(ax2 x y y))) 
(axl x y (f y (f x y))) 

8 Proof Experiments 

We tested the Waldmeister integration on 850 proof goals from the TPTP li- 
brary [26 , among them 115 on Boolean algebras (BOO), 156 on lattices (LAT), 
415 on groups (GRP), 106 on relation algebras (REL) and 58 on rings (RNG). 
The letters in brackets indicate the name given to these problem sets in TPTP. 
The library contains non-theorems and non-equational theorems that are be- 
yond Waldmeister's scope. In fact, in our experiments, Waldmeister has not 
been able to find proofs for all goals for principal reasons, but may also have 
failed to find proofs of equational theorems due to timeout. Here, however, we 
are only interested in relative success rates for proof reconstruction, that is, the 
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number or percentage of successful Waldmeister proofs that Mella was able to 
reconstruct, and in the running times of proof reconstruction relative to proof 
search. The outcome of these experiments are shown in Table [TJ 
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Table 1: Proof Reconstruction Experiments 



The hrst column in the table shows the TPTP problem sets. The first four 
columns are related to Waldmeister. The first of them shows the Waldmeister 
CPU time limits for proof search — Is, 5s, 10s, 30s and 300s. The second one 
gives the number of proofs searches that exceeded the time limit. The third one 
gives the number of proofs that aborted, for instance, due to out of memory 
errors. In the case of Boolean algebras, the fourth row shows that Waldmeister 
refuted one proof goal. The final three columns contain data on proof recon- 
struction. The first of them shows the number of proofs for which reconstruction 
failed; the second one the number of successfully reconstructed proofs. The row- 
wise sums of these columns give the numbers of successful Waldmeister proofs. 
The third row gives the percentage of successful proof reconstructions. 

First, it turns out that the CPU time limit for Waldmeister has little im- 
pact on success rates. The number of successful Waldmeister proofs increases 
only slightly with proof search time; the success rates for reconstruction remain 
almost unaffected. This suggests that there is little correlation between proof 
search time and the difficulty of reconstructing the resulting proof. Waldmeister 
could spend a long time traversing a search space only to find a very short and 
simple proof which is trivial to reconstruct. 

Second, success rates are surprisingly different for different problem sets. 
For groups, proof reconstruction was particularly poor, succeeding only 45% of 
the time for proofs returned after a 5 second timeout. For rings and relation 
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algebras, reconstruction succeeded almost always, with a 100% reconstruction 
success rate at 5 seconds. For lattices and Boolean algebras reconstruction was 
also overall successful; it is 80% for Boolean algebras and 76.1% for lattices 
(again with a 5 second timeout). Some explanations for this are given below. 

5 — r 




1 2 3 4 5 



Waldmeister Running Time (s) 

Figure 5: Waldmeister running times versus proof reconstruction times 

Next we have investigated the correlation between proof search and proof 
reconstruction times. A graph is plotted in Figure[S] Unfortunately, these times 
were very short for most of our proofs, which makes it very difficult to draw 
convincing conclusions. For some proofs, proof search took rather long whereas 
reconstruction was fast. In other cases, proof search was fast, but the proof 
could not be reconstructed or type checked efficiently. We have inspected the 
proof for each goal that took longer than 2s to reconstruct. In each of these 
cases, either proof terms are extremely long, with more than 100 lemmas, or 
there are extremely large substitutions. 

As an example, consider the following line from Figure [3j 

= by Axiom 2 LR at 2 with {xl <- i(xl)> 

In the substitution xl <- i (xl) , for instance, the term i (xl) can be enormous. 
In fact, our experiments contain substitutions of terms thousands of characters 
long, resulting in extremely large and unwieldy lemmas. This underscores the 
benefit of Waldmeister's lemma generation, which allows us to type check each 
one individually. As soon as proof terms become large, type checking slows 
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down. These observations confirm what one would expect: proof reconstruction 
times depend on proof sizes rather than proof search times, whereas proof search 
time and proof size are often only weakly correlated. Proof length, however, is 
not a key factor for using ATP systems in DTP program development. Ulti- 
mately, our experiments suggest that the Waldmeister integration into Mella 
is feasible, and proof reconstruction yields little overhead to proof search. 

There are several reasons why proof reconstruction may fail. Firstly, Wald- 
meister sometimes introduces fresh Skolem constants in proofs. These currently 
cannot be handled by the proof reconstruction code and cause it to fail. More 
precisely, such constants, which are dynamically generated by Waldmeister can 
currently not be associated with an environment during proof reconstruction. 
Secondly, rules such as the right inverse axiom xox^ 1 = 1 for groups, when ap- 
plied from right to left to a (sub)term 1, can lead to "inventing" fresh variables 
x in a Waldmeister proof. Mella would then have to introduce this value to 
the type signature of the lemma and supply it as a parameter. This currently 
assigns lemmas the wrong types in proofs and causes proof reconstruction to 
fail. For certain problem sets such as groups, "creative" proof steps of this kind 
seem particularly frequent, whereas in others (such as Boolean algebras, relation 
algebras or rings), they are present, but seem less significant. 

As an example, consider the proof discussed in Section [7J 

theorem proof : "(A : *) (f g : A -> A -> A) 

-> (axioml : (x y z : A) -> Id A (f x (g y (g x z) ) ) x) 
-> (axiom2 : (x y z : A) -> Id A (g x (f y (f x z) ) ) x) 
-> (x y : A) -> Id A (f x (g y x) ) x" . 
intro A f g axl ax2 x y. 

waldmeister : signature f g x y : axioms axl ax2 :kbo : timeout 2. 
qed. 

Waldmeister uses the following lemma in its proof: 
Lemma 1: slO(xl , sl4(x2,xl) ) = xl 

Sl0(xl,sl4(x2,xl)) 
= by Axiom 7 RL 

Sl0(xl,sl4(x2,sl4(xl,sl0(z,sl0(xl,y))))) 
= by Axiom 8 LR 

xl 

The second line of this proof introduces the new variables y and z. They are 
not mentioned in that lemma's type, hence the lemma cannot be easily recon- 
structed. We have implemented heuristics that guess instances of correct type 
for z and y (in this case xl and x2) which are present in the context. In this 
particular lemma, these heuristics make proof reconstruction succeed. In many 
other case, we still obtain confusing error messages. 

9 Related Work 

The general question of proof automation for ITPs is covered in a wide variety 
of literature. Barendregt and Barendsen [5] identify three approaches, namely 
accepting, skeptical, and autarkic. The accepting approach uses ATPs and SMT 
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solvers as oracles, requiring no proof output. The skeptical approach requires 
that external tools provide evidence or certificates which allow ITP systems to 
internally reconstruct external proofs to increase trust. The autarkic approach 
solely relies on internal implementations of solvers and provers or alternatively 
by verifying external tools. 

The accepting approach has, for many years, been pursued in the PVS ITP 
system, for instance by integrating the Yices SMT solver [25]. However, this 
is often insufficient for constructive logic as proofs have computational content 
and may require execution. 

The autarkic approach is the ideal, as an internally verified solver is guaran- 
teed to produce correct output. The omega, tauto and ring tactics in Coq, and 
Isabelle's blast and metis tactics for instance, are autarkic. The disadvantage 
of this approach however is clear: there is a need to efficiently re-implement 
provers in the proof system. 

The approach taken in this paper approach is skeptical. We believe this 
yields an adequate balance between efficiency and trust. Our approach is heav- 
ily inspired by Isabelle's Sledgehammer tool, which however is predominantly 
based on macro-step proof reconstruction. Additionally, ATP integration in 
Mizar — so far without proof reconstruction — is currently under development 
|24j . The skeptical approach has also been used in the context of dependent 
types, in a Waldmeister integration into Agda [13]. The relative inefficiency of 
this approach due to Agda proof normalisation is another main inspiration for 
Mella. Work on proof irrelevance in the most recent version of Agda, may 
however lead to a solution to this problem within Agda. More recently, using 
the skeptical approach, an SMT solver has been integrated into Coq PQ. 

10 Conclusion and Future Work 

We have integrated the equational theorem prover Waldmeister into the pro- 
totypical dependently typed programming language Mella which is based on 
the extended calculus of constructions with universes. In contrast to previous 
approaches, where theorem provers were added a posteriori to existing ITP sys- 
tems complement existing internal tactics and proof strategies, we take the ATP 
system as a core proof engine for the programming language and build the lan- 
guage around it. As a user front end we have implemented a proof scripting 
language in the Proof General environment. This provides an interface between 
Mella and Waldmeister. Since Waldmeister provides highly detailed proof 
output we can perform micro-step proof reconstruction, translating the proof 
output into a Mella proof term and type checking that term. 

Proof terms in Mella are not normalised. On the one hand, this makes 
proof reconstruction much more efficient. On the other hand this yields a proof 
certificate rather than a proper normalised proof. The strong normalisation 
property of the underlying type system, however, guarantees that all proofs 
that have been successfully checked can also be normalised. In the case of an 
equational proof this amounts to a refl term. 

In sum, our findings suggest that integrating ATP systems into DTP lan- 
guages can be very beneficial for program development in this setting, and that 
the approach taken with Mella may serve as a template for future approaches 
to integrate more expressive ATP systems in more sophisticated DTP languages. 
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There are various interesting directions for future work. 

First, already the minimalist formalism of Mella without recursion or data 
types requires proofs in full multi-sorted first-order constructive logic with equa- 
tions. However, state-of-the-art ATP systems are essentially all based on clas- 
sical hrst-order logic and often do not support sorts. Our current Waldmcistcr 
integration deals only with multi-sorted equational logic, a fragment where clas- 
sical and constructive reasoning coincide. While using classical ATP systems 
for more expressive fragments of first-order logic, such as Harrop formulae, is 
still possible, specific ATP systems for constructive or intuitionistic logic should 
be designed for applications in DTP. 

Second, many state-of-the-art ATP systems adhere to a common input stan- 
dard (TPTP), but many of them do not provide any detailed proof output or 
use a proprietary format. Detailed proof output is often perceived as detrimen- 
tal to proof search efficiency. In the context of DTP, however, its absence is 
detrimental to proof reconstruction. As Sledgehammer shows, macro-step proof 
reconstruction, that is, replaying proof search with an internally verified the- 
orem prover, has the disadvantage that many proofs provided by the external 
ATPs will not be accepted by the ITP system. Our proof experiments show 
that micro-step reconstruction of individual proofs steps is superior to this ap- 
proach, but it requires detailed ATP output. Proof standardisation as in the 
TSTP project [27] is a valuable step in this direction. While sheer proof power 
was the main emphasis of ATP development in the past, applications in the 
context of ITP systems requires this to be balanced with detailed proof output 
and support for types. 

Third, in its current version, Mella still suffers from the fact that Wald- 
meister proofs which introduce new constants or variables cannot always be 
reconstructed. We could work around this by reconstructing proofs as they 
are, with additional constants and variables included, and proving that such 
reconstructed proofs are equivalent to the desired proofs. The simple heuris- 
tics currently used should further be refined to cover more proofs. Alternatively, 
when heuristics fail, the presence of lemmas in Waldmeister proof outputs allows 
local manual proof reconstruction. Often, reconstruction failures are caused by 
a very small number of lemmas. These could be replaced by metavariables so 
that the proof can be delegated to users. Thus, even when and ATP system 
cannot completely finish a proof, it might still produce a number of simpler 
proof goals for the user and at least simplify the global proof goal. 

Fourth, Mella needs to be extended with features found in more sophis- 
ticated DTP and ITP tools. First we could extend CC+ with data types, 
induction or E-types. Alternatively we could extend the proof scripting lan- 
guage by adding more automation, or by providing a more structured method 
of proof construction, similar to Isar. Some features like induction might only 
require proof management such as induction tactics, and would not affect the 
ATP integration, while others, such as the addition of E-types seem to require 
modifications to how ATP systems are integrated. 
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